SECTION I

Mark Liao 廖宏源

Institute Of Information Science, Academia Sinica

Abstract

這個演講我將聚焦在以下一些題目。首先，本人將簡單介紹中央研究院資訊所多媒體技術實驗室及相關研究工作。本實驗室從2004年開始參與交大電腦視覺研發中心的學界科專計畫，這三年來的一些研發成果，包括如何自動擷取人的運動姿勢並將之應用於犯罪防制，以及利用運動軌跡之擷取來做即時違法事件的偵測等。最後，本人將介紹如何利用人類視覺系統概念來快速分割三維物體並擷取其重要部位（component）作為三維物體快速辨識之基礎。

Mining Abnormal Instances in Complex Networks

Shou-De Lin 林守德

Department of Computer Science and Information Engineering, National Taiwan University

Abstract

An important research problem in social network mining is to identify abnormal or suspicious instances in it. In this talk, I will describe an intelligent, unsupervised system that is capable of identifying and explaining (in natural language) abnormal or suspicious individuals from a social or semantic network. The potential applications include homeland security (i.e. identifying terrorists), fraud detection, and scientific discovery (e.g. in the domains of biology, chemistry, and literature).

Independent Component Analysis Via Copula Techniques

ray-bing chen陳瑞彬

Institute of Statistics, National University of Kaohsiung

Abstract

Independent component analysis (ICA) is a modern factor analysis tool for multivariate data. Given p-dimensional data, we search for that linear combination of data which creates (almost) independent component. Here copulas are used to model the p-dimensional data and then independent components are found by optimizing the copula parameters.

Linear and Nonlinear Independent Component Analysis by Leave-one-out Gadaline Approximation

Jiann-Ming Wu 吳建銘

Department of Applied Mathematics, National Dong Hwa University

Abstract

Under assumption that given multi-channel observations are post-nonlinear mixtures of independent sources, this work presents a novel method for linear and nonlinear independent component analysis. This work uses a network of multiple generalized adalines (gadalines) to emulate formation of multi-channel observations, where each weighted gadaline serves as a transmitting link from independent sources to a single channel, and applies leave-one-out gadaline approximation to resolve the reverse problem that is related to concurrent estimation of network parameters and independent components subject to given multi-channel observations. Leave-one-out gadaline approximation is operated under the mean-field-annealing process to ensure its computational accuracy for concurrent estimation of independent components and network parameters. Each time a single channel is selected and its correspondent independent component is refined to compensate for the error of approximating the selected channel by all the other independent components. Numerical simulation shows leave-one-out gadaline approximation is effective for linear independent component analysis and blind separation of post-nonlinear mixtures of independent sources.

Mapping Single-trial EEG Records on The Cortical Surface Through Electromagnetic Spatiotemporal ICA (EMSICA) Method

Arthur C. Tsai 蔡志鑫

Institute of Statistical Science, Academia Sinica

Abstract

Event-related potentials (ERPs) induced by visual perception and cognitive tasks have been extensively studied in neuropsychological experiments. ERP activities time-locked to stimulus presentation and task performance are often observed separately at individual scalp channels based on averaged time series across epochs and experimental subjects. An analysis using averaged EEG dynamics could discount information regarding interdependency between on-going EEG and salient ERP features. Advanced tools such as independent component analysis (ICA) have been developed for decomposing collections of single-trial EEG records into separate features. Those features (or independent components) can then be mapped onto the cortical surface using source localization algorithms to visualize brain activation maps and to study between-subject consistency. In this study, we propose a statistical framework for estimating the time course of spatiotemporally independent EEG components simultaneously with their cortical distributions. Within this framework, we implemented Bayesian spatiotemporal analysis for imaging the sources of EEG features on the cortical surface. The framework allows researchers to include prior knowledge regarding spatial locations as well as spatiotemporal independence of different EEG sources. The use of the Electromagnetic Spatiotemporal ICA (EMSICA) method is illustrated by mapping event-related EEG dynamics induced by events in a visual two-back continuous performance task.

Random Estimate Methods for Training Conditional Random Fields On-line

Han-Shen Huang 黃漢申

Institute of Information Science, Academia Sinica

Abstract

Conditional random fields (CRFs) have become one of the most prevailing solutions to sequential data classification. For applications with consecutive incoming training examples, online learning has the potential to achieve a likelihood as high as off-line learning without scanning all available training examples and usually has a much smaller memory footprint. To train CRFs on-line, we propose random estimate on-line methods (REOM), a class of on-line algorithms that formulates each on-line parameter estimate as a random variable. The advantage of our formulation is that we can easily derive useful properties related to the mean and variance of on-line estimates to guide the learning to convergence. We experimentally confirm the derived properties and show that the REOM converges an order of magnitude faster than Stochastic Meta-Descent (SMD) for on-line CRF training.

Large-scale Ranking by Paired Comparisons

Ruby Chiu-Hsing Weng 翁久幸

Department of Statistics, National Chengchi University

Abstract

Ranking for small or moderate size of objects has been well studied; however, ranking for millions of objects poses many challenges. This paper aims at Internet applications such as Internet voting for photos, foods, and many others. We consider a paired comparison approach and investigate difficulties in large-scale scenarios. As using too many pairs for comparisons leads to computational and memory difficulties, we propose using only a small subset. We develop efficient algorithms to obtain the ranking and demonstrate some careful selections of pairs. Experiments show that our approach is superior to an average method used by most paired-comparison websites.

Trust Region Newton Method for Large-scale Logistic Regression

Chih-Jen Lin 林智仁

Department of Computer Science and Information Engineering, National Taiwan University

Abstract

Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this talk, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations.

DNA Signal Detection by Dependency Graphs and Their Expanded Bayesian Networks

Chung-Chin Lu 呂忠津

Department of Electrical Engineering, National Tsing Hua University

Abstract

Thanks to the complete sequencing of the human and many other genomes, huge amounts of DNA sequence data have already been accumulated. In bioinformatics, an important issue is how to predict the complete structure of genes from the genomic DNA sequences. When we look at the DNA sequence of a gene, it seems a random sequence of four nucleic acids A, T, G, C. Since the cells manipulate genes essentially on the basis of the DNA sequence, it is believed that the structure of a gene is essentially determined by its DNA sequence.

A crucial part in gene structure prediction is to detect various structural or functional elements of a gene such as promoter elements, transcription start site, polyadenylation/cleavage site and splice sites. While gene structure prediction can be regarded as an attempt to define precisely the sequential dependency on the basic biochemical processes of transcription, RNA processing and translation. The sequence properties of known genes may offer us clues about the intrinsic mechanisms of these processes. The cell recognizes a gene by utilizing different proteins to bind different signals. Typically, there are several DNA segments required for a particular signal. We call these segments the members of the signal. However, not every member of a signal has a consensus sequence. We may and will assume that the differences between sequences for a member of a signal arose from a common ancestor via a stochastic process, which suggests that the construction of statistical models for signals is reasonable.

In this talk, the DNA signal detection problem will be tackled through the construction of a dependency graph for each DNA signal and the expansion of this dependency graph to a Bayesian network.

Confronting or Detouring the Semantic Gap? – a Perspective from Image/Video Retrieval

Winston H. Hsu 徐宏民

Graduate Institute of Networking and Multimedia, National Taiwan University

Abstract

Image and video retrieval has been an active research thanks to the continuing growth of online video data, digital photos, and 24-hour broadcast news videos. Such research problems are very challenging and attract both theoretical and industrial interests. In essence, the major obstacles are caused by the semantic gap -- the one between raw signals and high-level semantics. In this talk, we will first exemplify the state-of-the-art approaches, which try to confront the semantic gap, ranging from semantic concept detection, large-scale concept ontology, concept search, etc. On the other hand, we will highlight a promising avenue for image/video search which leverages those recurrent patterns, commonly observed across large-scale distributed sources, to improve the initial (imperfect) text search results. When evaluated on TRECVID 2005 video benchmark, the proposed approach can improve retrieval on the average up to 32% relative to the baseline text search method. Most of all, the proposed method does not require any additional input from users (e.g., example images), complex search models for special queries (e.g., named person search), or time-consuming concept models.

Bayesian Adaptive LASSO for Regression Models

Yi-Liang Tung童宜亮

Department of Environmental and Occupational Medical College, National Cheng Kung University

Abstract

The focus of this talk is the selection of informative covariates in regression models. We introduce a novel method to select variables, which is referred to as the Bayesian adaptive LASSO. This proposal is motivated by a particular hierarchical Bayesian model that is able to provide adaptive information to identify important covariates to be included in the final model. Unlike others’ approaches, we do not directly use posterior model probability for variable selection. Instead, we adopt the posterior information to construct an estimation criterion that shares the key feature of the adaptive LASSO, which advocates that the penalties of unimportant covariates should be larger than those of important covariates. In particular, we design an efficient MCMC algorithm to handle data sets involving a large number of covariates. Alternatively, the Bayesian adaptive LASSO can be solved by an efficient algorithm for LASSO. We compare the proposed method with its several Bayesian competitors through simulation experiments. The results show that the proposed method performs better in terms of the prediction accuracy and the relative model error. We also discuss the extension of the Bayesian adaptive LASSO in the generalized linear model and Cox’s proportional hazards model. Finally, the proposed method is illustrated with real examples.

Ensemble Classifiers and Their Applications^*

James J. Chen 陳章榮

National Center for Toxicological Research

U.S. Food Drug Administration, Jefferson, AR 72079

Abstract

Building a classification model from thousands of available predictor variables with a relatively small sample size is quite unstable. When the number of samples is much smaller than the number of predictors (high-dimensional data), there can be a multiplicity of good classification models. Furthermore, when the class sizes are not equal (imbalanced data), many classification rules will favor the majority class, and result in a high prediction accuracy on the majority class and a low prediction accuracy on the minority class. An ensemble classifier is formed by a set of base classifiers, and it makes an overall prediction based on the either majority vote or averaged prediction of the base classifiers. We present a classification algorithm called CERP (Classification by Ensembles from Random Partitions) for high-dimensional data prediction, and ensemble classifiers by using the re-sampling technique to generate multiple classifiers of equal class sizes for imbalanced data prediction.

* The views presented are those of the author and do not necessarily represent those of the U.S. Food and Drug Administration

Investigating Compartmental Model Structures Using Sparse Bayesian Learning

John Aston 艾詩敦

Institute of Statistical Science, Academia Sinica

Abstract

A method is presented for the analysis of compartmental models using sparse Bayesian learning, with specific application to dynamic positron emission tomography data. Parameters are estimated in a compartmental framework using an overcomplete exponential basis set and sparse Bayesian Learning. The number of compartments in the model therefore need not be specified and can be estimated from data. The technique is applicable to analysis with differing input functions and produces estimates of the system’s macro-parameters and model order. In addition, the Bayesian approach returns the posterior distributions allowing possible characterisation of the associated errors of macro-parameters. The method applied to the estimation of parametric images of neuroreceptor radioligand studies.

Controlling the False Discovery Rate for the SAM Method

Chen-An Tsai 蔡政安

Institute of Statistical Science, Academia Sinica

Abstract

The Significant Analysis of Microarray (SAM) proposed by Tusher, Tibshirani and Chu (2001) is nowadays a standard statistical procedure for detecting differentially expressed genes in microarray studies. Given a threshold Δ of the deviation between a t-like statistic and its empirical expectation, an estimated false discovery rate (FDR) is reported in additional to the conclusion of significance. However, the deviation between the statistic and its expectation is not easy to interpret as a conventional error measure. In practice, researchers often found the determination of the Δ is quite difficult. SAM suggests to try several different Δ’s in the analysis and use the result which is correspondent to an adequate FDR level. In this paper, we propose a SAM-based approach, in which, instead of Δ, the level of per-comparisonwise error rate (PCER) is specified. The new approach involves the kernel quantile estimation method in resampling data to improve the efficiency of the sample quantiles. To control the FDR of a conclusion, the BH step-up multiple testing procedure is utilized. Simulation studies are conducted to show that the proposed approach achieves adaptive control of FDR in various settings. The proposed approach is demonstrated with a real microarray dataset

L₂ Boosting and Model Selection

YuFen Huang 黃郁芬 and Yu-Pai Huang 黃宇白

Department of Mathematics, National Chung Cheng University

Abstract

In Buhlmann and Yu's (2005) work, when using model selection criterion to estimate the stopping iteration for L2 Boosting approach, it is necessary to compute all boosting iterations under consideration for the training data. Therefore, the main purpose of this talk is focused on studying the early stopping rule for L₂ Boosting during the training stage to seek a very substantial computational saving. We propose a change point detection method on model selection criterion to seek the earlier stop of the boosting iteration of training data for L₂ Boosting. We also extend our approaches discussed above to L₂ Boosting for classification problems. Simulation studies and a real data example to these approaches are provided for illustrations.

Keywords: L₂ Boosting.

Boosting under Model Assumptions

W. Drago Chen 陳武強¹² and C. Andy Tsao 曹振海²

¹Lan Yang Institute of Technology

²Department of Applied Mathematics, National Dong Hwa University

Abstract

In this talk, the population version of discrete Boosting prediction is regarded as the Bayesian estimation problem under normal-normal settings. Under this framework, the greedy Newton-like iterative algorithm is derived with respect to exponential criterion. It is shown that the Bayes procedure can be obtained as a limit of this iterative algorithm. This algorithm is then contrasted with the population AdaBoost in Friedman, Hastie and Tibishirani (2000) through simulations. Their theoretical properties are also compared. Furthermore, we extend this approach to more general functions of parameters, as the response, and high dimensional explanatory variables.

Keywords: Boosting, population version, loss approximation, Bayesian optimization

Simple but Powerful?: An Empirical Study of Regressions as Base Learner for Adaboost

Jian-Hong Lin 林建宏 and C. Andy Tsao 曹振海

Department of Applied Mathematics, National Dong Hwa University

Abstract

The choice of base learners affects the performance of boosting. While tree-based base learners such as decision stumps have been widely recommended and used in practice, many of its tunings and settings are less familiar to users. In this study, we investigate the performances of naive regressions as base learners for AdaBoost under regression-like settings. Our empirical studies suggest that naïve regressions, as base learners, have comparable or even better testing errors than the tree-based base learners when employed under regression-like settings. However, naive regressions are not unanimously preferred when applied to benchmark data sets.

Keywords: Boosting, base learner, classification tree, naive regression.

Adaboost-based Paired Feature Learning and Its Applications

Shang-Hong Lai 賴尚宏

Department of Computer Science, National Tsing Hua University

Abstract

In this talk, I will present a modified AdaBoost algorithm for paired feature learning. Our modifications in the AdaBoost algorithm consist of using the paired features for weak classifiers, employing ID3 decision tree for feature quantization, and applying Bayesian probability for the weak classifier output. We have successfully apply this framework to various applications, including face detection, pedestrian detection, image retrieval, and vertebra detection for spinal MRI. Our experimental results show that the modified AdaBoost algorithm is very accurate and efficient for practical applications.

Association and Temporal Mining for Post-filtering of Semantic Concept Detection in Video

Yung-Yu Chuang 莊永裕

Department of Computer Science and Information Engineering, National Taiwan University

Abstract

Concept-based retrieval has been proposed recently to deal with the semantic gap in video indexing. In this paper, we propose a general post-filtering framework that enhances the robustness and accuracy of semantic concept detection using concept association and temporal analysis. We propose strategies to combine related concept classifiers based on the discovered concept association rules. In addition, we also exploit the temporal coherence of a single concept among correlated shots to improve the detection accuracy of a concept detector. We demonstrate their performance on the TRECVID 2005 dataset. Our framework is shown both efficient and effective in improving the quality of concept-based video retrieval.

Direct Energy Minimization for Super-resolution on Nonlinear Manifolds

Tyng-Luh Liu 劉庭祿

Institute of Information Science, Academia Sinica

Abstract

In this work, we address the problem of single image super-resolution by exploring the manifold properties. Given a set of low resolution image patches and their corresponding high resolution patches, we assume they respectively reside on two non-linear manifolds that have similar locally-linear structure. This manifold correlation can be realized by a three-layer Markov network that connects performing super-resolution with energy minimization. The main advantage of our approach is that by working directly with the network model, there is no need to actually construct the mappings for the underlying manifolds. To achieve such efficiency, we establish an energy minimization model for the network that directly accounts for the expected property entailed by the manifold assumption. The resulting energy function has two nice properties for super-resolution. First, the function is convex so that the optimization can be efficiently done. Second, it can be shown to be an upper bound of the reconstruction error by our algorithm. Thus, minimizing the energy function automatically guarantees a lower reconstruction error---an important characteristic for promising stable super-resolution results.